A Straightforward Framework for Video Retrieval Using CLIP
نویسندگان
چکیده
Video Retrieval is a challenging task where the aims at matching text query to video or vice versa. Most of existing approaches for addressing such problem rely on annotations made by users. Although simple, this approach not always feasible in practice. In work, we explore application language-image model, CLIP, obtain representations without need said annotations. This model was explicitly trained learn common space images and can be compared. Using various techniques described document, extended its videos, obtaining state-of-the-art results MSR-VTT MSVD benchmarks.
منابع مشابه
EMD-Based Video Clip Retrieval by Many-to-Many Matching
This paper presents a new approach for video clip retrieval based on Earth Mover’s Distance (EMD). Instead of imposing one-to-one matching constraint as in [11, 14], our approach allows many-to-many matching methodology and is capable of tolerating errors due to video partitioning and various video editing effects. We formulate clip-based retrieval as a graph matching problem in two stages. In ...
متن کاملStructuring Indexes for Video Clip
For the multimedia boom to amount to much, it must become easier for producers of such systems to find, manage, and organize large amounts of source materials in a variety of formats. Video materials are, in many ways, the most demanding component format in multimedia systems, being in effect multimedia presentations all on their own. Video clips combine moving images with sound, and may incorp...
متن کاملA Unified Framework for Video Summarization, Browsing and Retrieval
Video content can be accessed by using either a top-down approach or a bottom-up approach [1, 2, 3, 4]. The top-down approach, i.e. video browsing, is useful when we need to get an “essence” of the content. The bottom-up approach, i.e. video retrieval, is useful when we know exactly what we are looking for in the content, as shown in Fig. 1. In video summarization, what “essence” the summary sh...
متن کاملA probabilistic framework for semantic video indexing, filtering, and retrieval
Semantic filtering and retrieval of multimedia content is crucial for efficient use of the multimedia data repositories. Video query by semantic keywords is one of the most difficult problems in multimedia data retrieval. The difficulty lies in the mapping between low-level video representation and high-level semantics. We therefore formulate the multimedia content access problem as a multimedi...
متن کاملA Videography Analysis Framework for Video Retrieval and Summarization
Overview: In this work, we focus on developing features and approaches to represent and analyze videography styles in unconstrained videos. By unconstrained videos, we mean typical consumer videos with significant content complexity and diverse editing artifacts, mostly with long duration. We present an approach for unsupervised videography analysis for unconstrained videos. Intuitively, each v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-77004-4_1